Search CORE

Open Archive Toulouse Archive Ouverte

Learning to recognise 3D human action from a new skeleton-based representation using deep convolutional neural networks

Author: Crouzil Alain
Khoudour Louahdi
Pham Huy-Hieu
Velastin Sergio A.
Zegers Pablo
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 15/04/2019
Field of study

Recognising human actions in untrimmed videos is an important challenging task. An effective three-dimensional (3D) motion representation and a powerful learning model are two key factors influencing recognition performance. In this study, the authors introduce a new skeleton-based representation for 3D action recognition in videos. The key idea of the proposed representation is to transform 3D joint coordinates of the human body carried in skeleton sequences into RGB images via a colour encoding process. By normalising the 3D joint coordinates and dividing each skeleton frame into five parts, where the joints are concatenated according to the order of their physical connections, the colour-coded representation is able to represent spatio-temporal evolutions of complex 3D motions, independently of the length of each sequence. They then design and train different deep convolutional neural networks based on the residual network architecture on the obtained image-based representations to learn 3D motion features and classify them into classes. Their proposed method is evaluated on two widely used action recognition benchmarks: MSR Action3D and NTU-RGB+D, a very large-scale dataset for 3D human action recognition. The experimental results demonstrate that the proposed method outperforms previous state-of-the-art approaches while requiring less computation for training and prediction

3D-Hog Embedding Frameworks for Single and Multi-Viewpoints Action Recognition Based on Human Silhouettes

Author: Angelini Federico
Chambers Jonathon A.
Fu Zeyu
Naqvi Syed Mohsen
Velastin Carroza Sergio Alejandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

This paper has been presented at : 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)Given the high demand for automated systems for human action recognition, great efforts have been undertaken in recent decades to progress the field. In this paper, we present frameworks for single and multi-viewpoints action recognition based on Space-Time Volume (STV) of human silhouettes and 3D-Histogram of Oriented Gradient (3D-HOG) embedding. We exploit fast-computational approaches involving Principal Component Analysis (PCA) over the local feature spaces for compactly describing actions as combinations of local gestures and L 2 -Regularized Logistic Regression (L 2 -RLR) for learning the action model from local features. Outperforming results on Weizmann and i3DPost datasets confirm efficacy of the proposed approaches as compared to the baseline method and other works, in terms of accuracy and robustness to appearance changes

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Evaluation framework for crowd behaviour simulation and analysis based on real videos and scene reconstruction

Author: Argyriou Vasileios
Greenhill Darrel
Jablonski Konrad
Velastin Sergio A.
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2015
Field of study

This paper has been presented at : 6th Latin-American Conference on Networked and Electronic Media (LACNEM 2015)Crowd simulation has been regarded as an important research topic in computer graphics, computer vision, and related areas. Various approaches have been proposed to simulate real life scenarios. In this paper, a novel framework that evaluates the accuracy and the realism of crowd simulation algorithms is presented. The framework is based on the concept of recreating real video scenes in 3D environments and applying crowd and pedestrian simulation algorithms to the agents using a plug-in architecture. The real videos are compared with recorded videos of the simulated scene and novel Human Visual System (HVS) based similarity features and metrics are introduced in order to compare and evaluate simulation methods. The experiments show that the proposed framework provides efficient methods to evaluate crowd and pedestrian simulation algorithms with high accuracy and low cost

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

A Bag of Expression framework for improved human action recognition

Author: Nazir Saima
Nebel Jean-Christophe
Velastin Sergio A.
Yousaf Muhammad Haroon
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

The Bag of Words (BoW) approach has been widely used for human action recognition in recent state-of-the-art methods. In this paper, we introduce what we call a Bag of Expression (BoE) framework, based on the bag of words method, for recognizing human action in simple and realistic scenarios. The proposed approach includes space time neighborhood information in addition to visual words. The main focus is to enhance the existing strengths of the BoW approach like view independence, scale invariance and occlusion handling. BOE includes independent pairs of neighbors for building expressions, therefore it is tolerant to occlusion and capable of handling view independence up to some extent in realistic scenarios. Our main contribution includes learning a class specific visual words extraction approach for establishing a relationship between these extracted visual words in both space and time dimension. Finally, we have carried out a set of experiments to optimize different parameters and compare its performance with recent state-of-the-art-methods. Our approach outperforms existing Bag of Words based approaches, when evaluated using the same performance evaluation methods. We tested our approach on four publicly available datasets for human action recognition i.e. UCF-Sports, KTH, UCF11 and UCF50 and achieve significant results i.e. 97.3%, 99.5%, 96.7% and 93.42% respectively in terms of average accuracy.Sergio A Velastin has received funding from the Universidad Carlos III de Madrid, the European Unions Seventh Framework Programme for research, technological development and demonstration under grant agreement nº 600371, el Ministerio de Economía, Industria y Competitividad (COFUND2013-51509) el Ministerio de Educación, Cultura y Deporte (CEI-15-17) and Banco Santander

arXiv.org e-Print Archive

Exploiting deep residual networks for human action recognition from skeletal data

Author: Crouzil Alain
Khoudour Louahdi
Pham Huy-Hieu
Velastin Sergio A.
Zegers Pablo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

The computer vision community is currently focusing on solving action recognition problems in real videos, which contain thousands of samples with many challenges. In this process, Deep Convolutional Neural Networks (D-CNNs) have played a significant role in advancing the state-of-the-art in various vision-based action recognition systems. Recently, the introduction of residual connections in conjunction with a more traditional CNN model in a single architecture called Residual Network (ResNet) has shown impressive performance and great potential for image recognition tasks. In this paper, we investigate and apply deep ResNets for human action recognition using skeletal data provided by depth sensors. Firstly, the 3D coordinates of the human body joints carried in skeleton sequences are transformed into image-based representations and stored as RGB images. These color images are able to capture the spatial-temporal evolutions of 3D motions from skeleton sequences and can be efficiently learned by D-CNNs. We then propose a novel deep learning architecture based on ResNets to learn features from obtained color-based representations and classify them into action classes. The proposed method is evaluated on three challenging benchmark datasets including MSR Action 3D, KARD, and NTU-RGB+D datasets. Experimental results demonstrate that our method achieves state-of-the-art performance for all these benchmarks whilst requiring less computation resource. In particular, the proposed method surpasses previous approaches by a significant margin of 3.4% on MSR Action 3D dataset, 0.67% on KARD dataset, and 2.5% on NTU-RGB+D dataset

Open Archive Toulouse Archive Ouverte

Fall detection and activity recognition using human skeleton features

Author: Fabregas Ernesto
Farias Gonzalo
Makris Dimitrios
Meza Ignacio
Ramirez Heilym
Velastin Sergio A.
Publication venue: Institute of Electrical and Electronics Engineers
Publication date: 01/01/2021
Field of study

Human activity recognition has attracted the attention of researchers around the world. This is an interesting problem that can be addressed in different ways. Many approaches have been presented during the last years. These applications present solutions to recognize different kinds of activities such as if the person is walking, running, jumping, jogging, or falling, among others. Amongst all these activities, fall detection has special importance because it is a common dangerous event for people of all ages with a more negative impact on the elderly population. Usually, these applications use sensors to detect sudden changes in the movement of the person. These kinds of sensors can be embedded in smartphones, necklaces, or smart wristbands to make them “wearable” devices. The main inconvenience is that these devices have to be placed on the subjects’ bodies. This might be uncomfortable and is not always feasible because this type of sensor must be monitored constantly, and can not be used in open spaces with unknown people. In this way, fall detection from video camera images presents some advantages over the wearable sensor-based approaches. This paper presents a vision-based approach to fall detection and activity recognition. The main contribution of the proposed method is to detect falls only by using images from a standard video-camera without the need to use environmental sensors. It carries out the detection using human skeleton estimation for features extraction. The use of human skeleton detection opens the possibility for detecting not only falls but also different kind of activities for several subjects in the same scene. So this approach can be used in real environments, where a large number of people may be present at the same time. The method is evaluated with the UP-FALL public dataset and surpasses the performance of other fall detection and activities recognition systems that use that dataset

Multidisciplinary Digital Publishing Institute

Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) model for human action recognition

Author: Feichtenhofer
Jean-Christophe Nebel
Khan
Kläser
Muhammad Haroon Yousaf
Nazir
Peng
Saima Nazir
Sergio A. Velastin
Simonyan
Simonyan
Soomro
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

This article belongs to the Section Intelligent SensorsHuman action recognition (HAR) has emerged as a core research domain for video understanding and analysis, thus attracting many researchers. Although significant results have been achieved in simple scenarios, HAR is still a challenging task due to issues associated with view independence, occlusion and inter-class variation observed in realistic scenarios. In previous research efforts, the classical bag of visual words approach along with its variations has been widely used. In this paper, we propose a Dynamic Spatio-Temporal Bag of Expressions (D-STBoE) model for human action recognition without compromising the strengths of the classical bag of visual words approach. Expressions are formed based on the density of a spatio-temporal cube of a visual word. To handle inter-class variation, we use class-specific visual word representation for visual expression generation. In contrast to the Bag of Expressions (BoE) model, the formation of visual expressions is based on the density of spatio-temporal cubes built around each visual word, as constructing neighborhoods with a fixed number of neighbors could include non-relevant information making a visual expression less discriminative in scenarios with occlusion and changing viewpoints. Thus, the proposed approach makes the model more robust to occlusion and changing viewpoint challenges present in realistic scenarios. Furthermore, we train a multi-class Support Vector Machine (SVM) for classifying bag of expressions into action classes. Comprehensive experiments on four publicly available datasets: KTH, UCF Sports, UCF11 and UCF50 show that the proposed model outperforms existing state-of-the-art human action recognition methods in term of accuracy to 99.21%, 98.60%, 96.94 and 94.10%, respectively.Sergio A. Velastin is grateful for funding received from the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement N° 600371, el Ministerio de Economía, Industria y Competitividad (COFUND2013-51509) el Ministerio de Educación, Cultura y Deporte (CEI-15-17) and Banco Santander. Muhammad Haroon Yousaf received funding from the Higher Education Commission, Pakistan for Swarm Robotics Lab under the National Centre for Robotics and Automation (NCRA). The authors also acknowledge support from the Directorate of ASR&TD, University of Engineering and Technology Taxila, Pakistan

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

arXiv.org e-Print Archive

Learning to Recognize 3D Human Action from A New Skeleton-based Representation Using Deep Convolutional Neural Networks

Author: Alain Crouzil
Bottou L.
Du Y.
Gowayyed M.A.
Graves A.
Hussein M.E.
Huy‐Hieu Pham
Ioffe S.
Jin K.
Kokkinos I.
Krizhevsky A.
Lin W.
Louahdi Khoudour
Nair V.
Niu W.
Pablo Zegers
Sergio A. Velastin
Srivastava N.
Szegedy C.
Wang J.
Xu H.
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2018
Field of study

Recognizing human actions in untrimmed videos is an important challenging task. An effective 3D motion representation and a powerful learning model are two key factors influencing recognition performance. In this paper we introduce a new skeletonbased representation for 3D action recognition in videos. The key idea of the proposed representation is to transform 3D joint coordinates of the human body carried in skeleton sequences into RGB images via a color encoding process. By normalizing the 3D joint coordinates and dividing each skeleton frame into five parts, where the joints are concatenated according to the order of their physical connections, the color-coded representation is able to represent spatio-temporal evolutions of complex 3D motions, independently of the length of each sequence. We then design and train different Deep Convolutional Neural Networks (D-CNNs) based on the Residual Network architecture (ResNet) on the obtained image-based representations to learn 3D motion features and classify them into classes. Our method is evaluated on two widely used action recognition benchmarks: MSR Action3D and NTU-RGB+D, a very large-scale dataset for 3D human action recognition. The experimental results demonstrate that the proposed method outperforms previous state-of-the-art approaches whilst requiring less computation for training and prediction.This research was carried out at the Cerema Research Center (CEREMA) and Toulouse Institute of Computer Science Research (IRIT), Toulouse, France. Sergio A. Velastin is grateful for funding received from the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for Research, Technological Development and demonstration under grant agreement N. 600371, el Ministerio de Economia, Industria y Competitividad (COFUND2013-51509) el Ministerio de Educación, cultura y Deporte (CEI-15-17) and Banco Santander

Open Archive Toulouse Archive Ouverte

Directory of Open Access Journals